High-Performance Multi-Pass Unication Parsing

نویسنده

Paul Wesley Placeway

چکیده

Parsing natural language is an attempt to discover some structure in a text (or textual representation) generated by a person. This structure can be put to a variety of uses, including machine translation, grammar conformance checking, and determination of prosody in text-to-speech tasks. Recent theories of Syntax use Unication to better describe the intricacies of natural language [137]. For parsing systems, unication techniques have been either added to a context-free base system [152, 40, 4, 23], or replaced the context-free base entirely [118, 135, 45] (possibly putting it back later [136]). The seemingly small step of adding unication has opened a Pandora’s Box of computational complexity, increasing the difculty of the problem from polynomial [48] to somewhere between NP-complete and intractable, depending on the details of the unication system and how it was added [10]. Worse, unication on a context-free base parser can break the packing technique used to address the problem of ambiguity, leading to exponential blow-ups of the parser’s performance in both space and time in practice. I propose the use of a multi-pass strategy to avoid these problems in practice. I describe a parser which combines the use of shallow, simple value unication with some approximation techniques in order to nd a covering packed parse-forest. This parseforest is then searched for a single-best fully-unifying value; the scoring system which drives the heuristic search encodes linguistically-based disambiguation preferences. The resulting two-pass parser is compared to an ordinary single-pass parser in the context of a heavy-weight knowledge-based machine translation system. The two-pass parser is shown to be competitive with the single-pass parser on average data, both in terms of time and space. It is also shown to be able to avoid a common class of ambiguity blow-up that the single-pass parser is subject to. These results indicate that the multi-pass technique, interleaving some of the unication equations in the parse, is the superior approach for heavy-weight unication parsing.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

بررسی مقایسه‌ای تأثیر برچسب‌زنی مقولات دستوری بر تجزیه در پردازش خودکار زبان فارسی

In this paper, the role of Part-of-Speech (POS) tagging for parsing in automatic processing of the Persian language is studied. To this end, the impact of the quality of POS tagging as well as the impact of the quantity of information available in the POS tags on parsing are studied. To reach the goals, three parsing scenarios are proposed and compared. In the first scenario, the parser assigns...

متن کامل

Improving Multi-pass Transition-Based Dependency Parsing Using Enhanced Shift Actions

In multi-pass transition-based dependency parsing algorithm, the shift actions are usually inconsistent for the same node pair in different passes. Some node pairs have a indeed dependency relation, but the modifier node has not been a complete subtree yet. The bottom-up parsing strategy requires to perform shift action for these node pairs. In this paper, we propose a method to improve perform...

متن کامل

Computing Phrasal-signs in HPSG prior to Parsing

This paper describes techniques to compile lexical entries in HPSG (Pollard and Sag, 1987; Pollard and Sag, 1993)-style grammar into a set of nite state au-tomata. The states in automata are possible signs derived from lexical entries and contain information raised from the lexical entries. The automata are augmented with feature structures used by a partial unication routine and de-layed/froze...

متن کامل

Efficient Multi-Pass Decoding for Synchronous Context Free Grammars

We take a multi-pass approach to machine translation decoding when using synchronous context-free grammars as the translation model and n-gram language models: the first pass uses a bigram language model, and the resulting parse forest is used in the second pass to guide search with a trigram language model. The trigram pass closes most of the performance gap between a bigram decoder and a much...

متن کامل